In this exploratory data analysis, our group examines dengue fever cases in the city of Mérida, Mexico, using data provided by Professor Clennon of the Emory Environmental Science Department. Dengue fever is a mosquito-borne viral infection caused by the dengue virus, which poses a significant public health concern in tropical and subtropical regions.
Some of dengue fever’s most common symptoms include sudden drops in blood pressure, nausea, and muscle pain. Researchers have found that Merida, the capital of the state of Yucatán, experiences recurrent dengue outbreaks due to its tropical climate, seasonal rainfall and standing water, and dense urban population density.
Our group is interested in public health, epidemiology, and the positive effect that well-developed data analysis can have on urban welfare. In part, this interest is a result of the COVID-19 pandemic and widespread reporting on its pathology, treatment, and mitigation. Dengue’s effects are similar to COVID-19’s, and the disease similarly impacts communities worldwide. However, its relatively slower spread and its prevalence predominantly in tropical areas with underdeveloped healthcare infrastructure have led to severe neglect compared to other dangerous diseases. This study is primarily an exercise in geospatial and temporal analysis of viral disease.
Our dengue dataset includes 540 rows, each representing a distinct region within Mérida. Each region is quantified based on its coordinates from the Universal Transverse Mercator (UTM) system, a grid-based coordinate system used to map locations using meters. The dataset initially had 8 columns: the X and Y UTM coordinates of each region, SP_ID (an identifier for each region), CVEGEO (geographic identifier code used by the Mexican Geographic Agency), and the number of cases in each region across four different years (2012-2015). As we explain in the Data Cleaning step, SP_ID and CVEGEO are not necessary for our analysis here. Additionally, we have included complementary data on ovitrap egg density (the quantity of mosquito eggs recorded for 4177 ovitraps distributed throughout the city). Although it is not the main focus of our analysis, it provides some insight into potential examination of the relationship between mosquito breeding sites and dengue incidence.
Before developing any visualizations or analytical models, we prepared the R environment by loading our primary dataset, “Merida_Den12_13_14_15.csv,” along with a complete suite of R packages. We used the tidyverse package, which provides core tools like dplyr for data transformation and ggplot2 for static visualization. For our more advanced interactive plots, we loaded plotly (used for the animated density map) and leaflet (used for the interactive Year-over-Year map). We also included sf to read and handle the simple features (geospatial) data for Merida. Once all libraries were loaded, we imported the dataset into a data frame. The resulting data frame contained 540 rows and 8 columns, representing spatial identifiers, coordinate values, and annual dengue case counts. This formed the foundation for our exploratory analysis.
Data Cleaning is a crucial step to ensure the integrity and relevance of the data prior to our analysis. We want to ensure that all the data required for this project is clean, validated, and standardized so that our group can create data visualizations using the shared data.
The raw dataset contains several columns, including SP_ID and CVEGEO, which serve as identifiers for position and case record. After discussing the goals for the project, we determined that our exploratory analysis is focused exclusively on the relationship between spatial coordinates and case counts. Therefore, the important features for this analysis are the X and Y coordinates and the annual case count columns (Den2012 through Den2015). The SP_ID and CVEGEOCVEGEO columns, while potentially useful for future work like joining with demographic or climate data, are not required for our current visualization and modeling goals. Therefore, we dropped the project’s irrelevant columns, creating a standard data frame named data_cleaned.
Next, we checked the dataframe for missing data because null values can introduce significant bias, propagate errors in calculations, and cause visualization failures. The R console output confirms that all columns returned 0 null values. This finding validates the completeness of our dataset, allowing us to proceed with our analysis without further cleaning procedures, ensuring our results are based on the full set of observations.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(leaflet)
library(sf)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(htmltools)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
data <- read.csv('Merida_Den12_13_14_15.csv')
head(data)
## X Y SP_ID CVEGEO Den2012 Den2013 Den2014 Den2015
## 1 231639.6 2317510 0 3104100010070 17 6 2 2
## 2 231323.1 2316622 1 3104100010085 35 15 10 18
## 3 230769.7 2316113 2 310410001009A 35 14 8 17
## 4 233250.8 2316208 3 3104100010136 40 8 6 10
## 5 233266.3 2317442 4 3104100010140 12 8 4 4
## 6 232736.6 2317238 5 3104100010155 30 8 1 5
data_cleaned <- data %>%
select(X, Y, starts_with("Den"))
na_counts <- colSums(is.na(data_cleaned))
print(na_counts)
## X Y Den2012 Den2013 Den2014 Den2015
## 0 0 0 0 0 0
head(data_cleaned)
## X Y Den2012 Den2013 Den2014 Den2015
## 1 231639.6 2317510 17 6 2 2
## 2 231323.1 2316622 35 15 10 18
## 3 230769.7 2316113 35 14 8 17
## 4 233250.8 2316208 40 8 6 10
## 5 233266.3 2317442 12 8 4 4
## 6 232736.6 2317238 30 8 1 5
### Reorganize the dataset so each year's dengue fever count is stored as a separate row
data_formatted <- data %>%
pivot_longer(
cols = starts_with("Den"),
names_to = "Year",
values_to = "Count"
) %>%
mutate(Year = as.numeric(gsub("Den", "", Year)))
### Summarize the total dengue fever cases per year and calculate Year-over-Year percent change
data_summary <- data_formatted %>%
group_by(Year) %>%
summarise(
total_dengue = sum(Count, na.rm = TRUE))%>%
arrange(Year) %>%
mutate(
percent_change = (total_dengue - lag(total_dengue))/lag(total_dengue) * 100
)
### Visualize total dengue fever cases per year and Year-over-Year percent change
ggplot(data_summary, aes(x = Year)) +
geom_line(aes(y = percent_change * 100), color = "red", size = 1.2) +
scale_y_continuous(
name = "Total dengue Cases",
sec.axis = sec_axis(~./100, name = "% Change from Previous Year")
) +
geom_col(aes(y=total_dengue), fill = "blue")+
theme_minimal(base_size = 12) +
labs(
title = "Total dengue Cases and Year-over-Year % Change in Merida",
x = "Year"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
When it comes to diseases, outbreaks, and other public health concerns, it is crucial to monitor the volume of cases to understand if the outbreak is being contained. Starting off as simply as possible, we created a first graph that summarizes the broader statistics of our dengue fever dataset in Merida – more specifically, we created a visualization that summarizes the amount of dengue fever cases from 2012 to 2015. This was conducted by grouping the cases by what year they were from, and then counting how many cases were in each year. The graph shows 2012 having the highest volume of dengue fever cases of almost 10,000, which has a significant dropoff in 2013. Following this trend, 2014 also sees a decline in total dengue fever cases, until another sharp increase of cases comes in 2015 of around 7,500 cases.
Despite being able to visually see the fluctuations in volume of dengue fever cases, our group decided to create a percent change, year-over-year, visualization over the same chart. This method of calculating percent change is useful in tracking the rate of increase or decrease of the dengue fever, which allows us to not only understand the spread of the disease, but how quickly it is spreading. By using geom_line in our visualization, the red line helps our audience understand the relative changes in dengue fever cases over the years. When the percent change is sharply negative, like in 2013 and 2014, that could indicate successful containment plans, while sharp positive changes in the red line could signal a public health crisis.
The pertinence of this visualization is that it helps with public health tracking and surveillance. Despite being a more general statistical visualization, this graph can help track the trends of how quickly dengue fever is spreading in Merida, while also assessing the effectiveness of interventions in certain years, assuming that certain confounding variables are controlled for – other variables such as seasonal changes and weather variation could certainly impact our future findings. Ultimately, data tracking leads to action, and in the case of dengue fever, it can lead to mosquito control initiatives or vaccination campaigns.
This initial analysis provides a strong foundation for conducting a geospatial analysis on where these dengue fever cases are specifically occurring.
Before we further analyze dengue incidence over time, let’s briefly look at some data for mosquito ovitraps. Ovitraps allow researchers to estimate the number of mosquitoes in an area by replicating breeding environments and measuring egg count. As dengue is a mosquito-borne illness, we should keep in mind the relative density of mosquito eggs as we look at disease hot spots later on. For the purposes of this analysis, we will not run statistical tests examining the exact assocation between estimated mosquito density and dengue incidence, as we intend to focus more on geospatial and temporal analytics.
map <- st_read("Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp")
## Reading layer `Merida_Den12_13_14_15' from data source
## `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 540 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 212287.2 ymin: 2309495 xmax: 236386 ymax: 2333181
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
ovitraps <- st_read("Merida_S25/Ovitraps_Sum/Ovitraps_Sum.shp")
## Reading layer `Ovitraps_Sum' from data source
## `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Ovitraps_Sum/Ovitraps_Sum.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 4177 features and 50 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 219360.9 ymin: 2312502 xmax: 233659.3 ymax: 2329610
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
# Create the plot with a gradient from yellow to red, representing relatively low to relatively high egg density. Overlay the plot on a blank regional map of Merida.
ggplot() +
geom_sf(data = map, color = "black", linewidth = 0.3) + # Create map outline
geom_sf(data = ovitraps, aes(color = Total), size = 2, alpha = 0.9) + # Generate ovitrap points
scale_color_gradient(low = "yellow", high = "red", name = "Eggs per Trap") + # Add color gradient to represent eggs per trap
labs(title = "Ovitrap Egg Counts in Merida") # Add title
# Reshape dataframe from wide to long format, each year becomes a separate row with 'Cases' column
data_long <- data_cleaned %>%
pivot_longer(
cols = starts_with("Den"), # select all year columns
names_to = "Year", # new column for year
names_prefix = "Den", # remove 'Den' prefix
values_to = "Cases" # new column for case counts
) %>%
# Convert year column to integer type
mutate(Year = as.integer(Year))
# Create animated scatter plot of dengue cases
fig <- plot_ly(
data_long, # use long-format dataframe
x = ~X,
y = ~Y,
frame = ~Year, # animate by year
type = 'scatter',
mode = 'markers',
color = ~Cases, # color by case count
colors = "Reds", # color scale
marker = list(
size = 10,
line = list(width = 0.3, color = 'black') # add thin black border
),
# Add hover text with detailed info
text = ~paste(
"Year:", Year,
"<br>Cases:", Cases,
"<br>X:", round(X, 2),
"<br>Y:", round(Y, 2)
),
hoverinfo = 'text'
) %>%
# Add color bar to indicate case counts
colorbar(title = "Annual Case Count") %>%
# Update layout, axis titles, and background colors
layout(
title = 'Dengue Fever Density in Merida (2012–2015)',
xaxis = list(title = 'X Coordinate (UTM in Meters)'),
yaxis = list(title = 'Y Coordinate (UTM in Meters)'),
plot_bgcolor = "white",
paper_bgcolor = "white"
) %>%
# Set animation speed and transition style
animation_opts(
frame = 3000, # 3 seconds per frame
transition = 800, # 0.8 second transition
easing = "linear"
) %>%
# Add animation slider for selecting year
animation_slider(
currentvalue = list(
prefix = "Year: ",
font = list(color = "red")
)
)
fig
We can further analyze the evolution dengue virus in the Merida region over a period of time with an animated density chart. By using Plotly, we can plot the position, measured in UTM coordinates (meters), of the annual case counts (2012-2015), allowing us to visualize high-density infected regions in red and low-density regions in white. The animation illustrates how the spatial distribution of the disease fluctuates across the same areas throughout the analyzed timespan.
When analyzing the density chart in 2012, we can see a significant outbreak as there are two primary hotspot regions with annual case counts exceeding 100. It is important to note that these hotspots are not isolated. They are surrounded by dengue-infested locations reporting nearly 50 annual cases, indicating a concentrated and widespread initial event.
If we advance to 2013, we can see a surprising city-wide reduction in dengue incidence. There are far fewer cases overall, with no single region in Merida reporting over 100 annual cases and only a handful approaching 50 cases. Overall, the area is dominated by low-incidence areas, suggesting the outbreak has died down. This trend continues into 2014, which shows an even more pronounced decline as the vast majority of regions report zero cases, and the most significant hotspot contains only around 30 cases.
However, in 2015, we can see a powerful resurgence. While many areas remain low-incidence, two new, intense hotspots emerge with annual case counts approaching 150. Additionally, the space between these two hotspots contains a dense, concentrated cluster of locations reporting nearly 100 annual cases each.
To sum up, this visualization demonstrates that disease spread in this urban environment is not random, but rather persistent and spatially clustered in specific, identifiable high-risk zones. The 2015 outbreak also demonstrates that new high-incidence areas are likely to appear adjacent to or within previous outbreak zones, confirming the geospatial nature of disease spread, which may be influenced by environmental or population-related factors.
# 1. Convert data to sf and transform to WGS84 for Leaflet
# We use EPSG:6371 as identified in the original Section 7
data_sf <- st_as_sf(data, coords = c("X", "Y"), crs = 6371)
data_sf_wgs84 <- st_transform(data_sf, crs = 4326)
# 2. Load and transform the neighborhood shapefile
town_map_raw <- st_read("Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp")
## Reading layer `Merida_Den12_13_14_15' from data source
## `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 540 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 212287.2 ymin: 2309495 xmax: 236386 ymax: 2333181
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
town_map_wgs84 <- st_transform(town_map_raw, crs = 4326)
# 3. Create a sequential color palette for absolute cases
# We find the global max to keep the scale consistent across years
all_cases <- c(data_sf_wgs84$Den2012, data_sf_wgs84$Den2013, data_sf_wgs84$Den2014, data_sf_wgs84$Den2015)
global_max_cases <- max(all_cases, na.rm = TRUE)
pal_cases <- colorNumeric(
palette = "Reds",
domain = c(0, global_max_cases),
na.color = "#a0a0a0"
)
# 4. Create popups for each year
popup_2012 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2012 Cases: %d", SP_ID, Den2012), HTML)
popup_2013 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2013 Cases: %d", SP_ID, Den2013), HTML)
popup_2014 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2014 Cases: %d", SP_ID, Den2014), HTML)
popup_2015 <- ~lapply(sprintf("<strong>ID: %s</strong><br/>2015 Cases: %d", SP_ID, Den2015), HTML)
# 5. Create the interactive leaflet map
m_cases <- leaflet(data_sf_wgs84) %>%
# Add base map layers
addProviderTiles(providers$CartoDB.Positron, group = "Light Map") %>%
addProviderTiles(providers$Esri.WorldImagery, group = "Satellite") %>%
# Add Neighborhoods layer
addPolygons(
data = town_map_wgs84,
group = "Neighborhoods",
fillOpacity = 0.1,
color = "#444",
weight = 1,
opacity = 0.7,
popup = ~SP_ID
) %>%
# Set the map to focus on the data
fitBounds(
~as.numeric(st_bbox(geometry)[1]),
~as.numeric(st_bbox(geometry)[2]),
~as.numeric(st_bbox(geometry)[3]),
~as.numeric(st_bbox(geometry)[4])
) %>%
# ==== Layer 1: 2012 Cases ====
addCircleMarkers(
group = "2012 Cases",
color = ~pal_cases(Den2012),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2012,
label = ~as.character(Den2012)
) %>%
# ==== Layer 2: 2013 Cases ====
addCircleMarkers(
group = "2013 Cases",
color = ~pal_cases(Den2013),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2013,
label = ~as.character(Den2013)
) %>%
# ==== Layer 3: 2014 Cases ====
addCircleMarkers(
group = "2014 Cases",
color = ~pal_cases(Den2014),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2014,
label = ~as.character(Den2014)
) %>%
# ==== Layer 4: 2015 Cases ====
addCircleMarkers(
group = "2015 Cases",
color = ~pal_cases(Den2015),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2015,
label = ~as.character(Den2015)
) %>%
# ==== Legend ====
addLegend(
pal = pal_cases,
values = c(0, global_max_cases),
title = "Case Count",
position = "bottomright",
opacity = 1
) %>%
# ==== Layer Control ====
addLayersControl(
baseGroups = c("2012 Cases", "2013 Cases", "2014 Cases", "2015 Cases"),
overlayGroups = c("Light Map", "Satellite", "Neighborhoods"),
options = layersControlOptions(collapsed = FALSE)
)
# Display the map
m_cases
While the animated density plot in the previous section provided an excellent overview of the temporal flow, this interactive leaflet map offers another perspective that adds more context for our analysis. By overlaying the case data on real world maps of both the light map as well as the satelite imagery map and adding the ‘Neighborhoods’ shapefile, we can move from abstract coordinates to tangible locations. The layer controls which function as a set of radio buttons to select the basemap to have the data overlayed upon allow for a contextually rich understanding of absolute case counts for each year.
Using this map confirms and deepens our previous findings. The 2012 Cases layer shows the initial outbreak’s hot spots clearly, colored in dark red. Looking at the “Neighborhoods” overlay reveals these are concentrated in specific districts. When switching to the 2013 Cases and 2014 Cases layers, we can observe the city wide decline. The dark red dots (high case counts) almost vanish from the map, reinforcing the narrative of a successful containment period that was first identified in the aggregate bar chart.
The 2015 Cases layer highlights the dramatic resurgence. We can now pinpoint the exact neighborhoods that experienced this new outbreak, which appear to be both a re emergence in some of the 2012 hot spots and an expansion into new adjacent areas. This map is invaluable for understanding the where of the outbreak. However, it only shows absolute counts. To understand the speed and volatility of the spread, we next need to analyze the relative change from one year to the next.
# 1. Calculate Year-over-Year (YoY) percentage change
# We must handle cases where the denominator (previous year) is 0.
calculate_yoy <- function(current, previous) {
case_when(
previous == 0 & current == 0 ~ 0, # 0 to 0 is 0% change
previous == 0 & current > 0 ~ Inf, # 0 to N is infinite % change
previous > 0 ~ (current - previous) / previous
)
}
data_yoy <- data %>%
mutate(
YoY_2013 = calculate_yoy(Den2013, Den2012),
YoY_2014 = calculate_yoy(Den2014, Den2013),
YoY_2015 = calculate_yoy(Den2015, Den2014)
)
# 2. Clean up Inf/-Inf/NaN values for visualization. We'll set Inf to NA.
data_yoy <- data_yoy %>%
mutate(across(starts_with("YoY_"), ~if_else(is.finite(.), ., NA_real_)))
# 3. Convert to a geospatial 'sf' object and transform coordinates
# The .prj file (EPSG:6371) tells us this is Mexico ITRF2008 / UTM zone 16N.
# Leaflet requires standard WGS84 (EPSG:4326) coordinates (lat/lon).
data_sf <- st_as_sf(data_yoy, coords = c("X", "Y"), crs = 6371)
data_sf_wgs84 <- st_transform(data_sf, crs = 4326)
# 4. Define a diverging color palette
# We cap the domain from -1 (-100%) to 2 (200%) to handle outliers
cap_range <- c(-1, 2)
pal_yoy <- colorNumeric(
palette = "RdYlBu",
domain = cap_range,
na.color = "#a0a0a0",
reverse = TRUE # Make red positive (increase), blue negative (decrease)
)
# 5. Create formatted popup and label content
# Popups appear on click
popup_2013 <- ~lapply(sprintf(
"<strong>Location ID: %s</strong><br/>2012 Cases: %d<br/>2013 Cases: %d<br/>YoY Change: %s",
SP_ID, Den2012, Den2013, scales::percent(YoY_2013, accuracy = 0.1)
), HTML)
popup_2014 <- ~lapply(sprintf(
"<strong>Location ID: %s</strong><br/>2013 Cases: %d<br/>2014 Cases: %d<br/>YoY Change: %s",
SP_ID, Den2013, Den2014, scales::percent(YoY_2014, accuracy = 0.1)
), HTML)
popup_2015 <- ~lapply(sprintf(
"<strong>Location ID: %s</strong><br/>2014 Cases: %d<br/>2015 Cases: %d<br/>YoY Change: %s",
SP_ID, Den2014, Den2015, scales::percent(YoY_2015, accuracy = 0.1)
), HTML)
# Labels appear on hover
label_2013 <- ~scales::percent(YoY_2013, accuracy = 0.1)
label_2014 <- ~scales::percent(YoY_2014, accuracy = 0.1)
label_2015 <- ~scales::percent(YoY_2015, accuracy = 0.1)
# Load and transform the neighborhood shapefile
town_map_raw <- st_read("Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp")
## Reading layer `Merida_Den12_13_14_15' from data source
## `/Users/amanshaik/Documents/GitHub/dengueAnalysisTechWriting/Merida_S25/Merida_Den12_13_14_15/Merida_Den12_13_14_15.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 540 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 212287.2 ymin: 2309495 xmax: 236386 ymax: 2333181
## Projected CRS: Mexico ITRF2008 / UTM zone 16N
town_map_wgs84 <- st_transform(town_map_raw, crs = 4326)
# Create the interactive leaflet map
m <- leaflet(data_sf_wgs84) %>%
# Add base map layers
addProviderTiles(providers$CartoDB.Positron, group = "Light Map") %>%
addProviderTiles(providers$Esri.WorldImagery, group = "Satellite") %>%
# Add Neighborhoods layer
addPolygons(
data = town_map_wgs84,
group = "Neighborhoods",
fillOpacity = 0.1,
color = "#444",
weight = 1,
opacity = 0.7,
popup = ~~SP_ID
) %>%
# Set the map to focus on the data
fitBounds(
~as.numeric(st_bbox(geometry)[1]),
~as.numeric(st_bbox(geometry)[2]),
~as.numeric(st_bbox(geometry)[3]),
~as.numeric(st_bbox(geometry)[4])
) %>%
# ==== Layer 1: 2013 vs 2012 ====
addCircleMarkers(
group = "2013 vs 2012",
color = ~pal_yoy(YoY_2013),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2013,
label = label_2013
) %>%
# ==== Layer 2: 2014 vs 2013 ====
addCircleMarkers(
group = "2014 vs 2013",
color = ~pal_yoy(YoY_2014),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2014,
label = label_2014
) %>%
# ==== Layer 3: 2015 vs 2014 ====
addCircleMarkers(
group = "2015 vs 2014",
color = ~pal_yoy(YoY_2015),
fillOpacity = 0.8,
stroke = FALSE,
radius = 5,
popup = popup_2015,
label = label_2015
) %>%
# ==== Legend ====
addLegend(
pal = pal_yoy,
values = cap_range,
title = "YoY % Change",
position = "bottomright",
labFormat = labelFormat(suffix = "%", transform = function(x) x * 100),
opacity = 1
) %>%
# ==== Layer Control ====
addLayersControl(
baseGroups = c("2013 vs 2012", "2014 vs 2013", "2015 vs 2014"),
overlayGroups = c("Light Map", "Satellite", "Neighborhoods"),
options = layersControlOptions(collapsed = FALSE)
)
## Warning in pal_yoy(YoY_2013): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2013): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2014): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2014): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2015): Some values were outside the color scale and will
## be treated as NA
## Warning in pal_yoy(YoY_2015): Some values were outside the color scale and will
## be treated as NA
# Display the map
m
It is not enough to only look at the absolute numbers for the outbreak of dengue fever. We must also look at the changes through time to identify whether there are any temporal trends. Health officals would want to measure key insights such as 50% decrease in a major hotspot or a 500% increase in a new previously safe area. This map visualizes the Year over Year YoY percentage change to provide these temporal insights. We represent these changes by using the color pallets of data points on the map with red points to signify a percentage increase in worsening outbreak while using blue points to show a percentage decrease which represents improving situation compared to the previous year. Finally, grey points indicate areas with no change or no cases in either period of the outbreak.
The first layer 2013 vs 2012 visualizes the success of containment efforts after the 2012 outbreak. These trends agree with our findings in section 3 of our analysis where we see a massive decrease in the amount of outbreaks which is also represented here with large quantities of blue points. The 2014 vs 2013 layer shows a more mixed but still generally positive story. Many areas that were blue in the previous layer are now grey indicating they went from some cases to zero or stayed at zero. The overall trend remains one of decline or stabilization which is as expected as the amount decrease is limited compared to the first outbreak year.
The 2015 vs 2014 layer though is showing a different story that starkly contrasts the previous trends. The map is now covered in bright red dots with many representing infinite or extremely high percentage increases as cases jumped from near 0 in 2014 to significant numbers in 2015. This visualization starkly contrasts with the absolute case map of Section 6. While that map showed where the 2015 hotspots were, this map shows the surprising amount of increase of dengue fever occurences representing another outbreak. This highlights the virus’s ability to resurge unexpectedly turning previously low risk zones into new hotspots quickly.
In our exploratory analysis, we successfully identified the spatial and temporal dynamics of Dengue fever in Mérida from 2012 to 2015. Our data revealed that a significant 2012 outbreak was followed by a two-year period of decline (2013-2014), allowing us to hypothesize that there was a successful containment. However, our findings in the Annual Case Trend Plot (Section 3) and the Animated Spatial Distribution Map (Section 5) indicate there was a powerful and abrupt resurgence in 2015.
Our geospatial analyses demonstrated that this 2015 outbreak was not a random event. The animated map revealed that the new hotspots were persistent and spatially clustered, emerging within or immediately adjacent to the original 2012 outbreak zones. The Year-over-Year (YoY) analysis (Section 7) supports this, showing a volatile spread into previously low-risk areas, highlighting the virus’s ability to resurge. These visualizations strongly support our primary conclusion that Dengue fever spreading in this urban environment is not random but a persistent, spatially driven phenomenon. The analysis of ovitrap data (Section 4) provides a preliminary, data-driven hypothesis for why these high-risk zones correlate with high vector (mosquito) density, suggesting they act as reservoirs for the virus even during “quiet” years.
This exploratory analysis can be further adapted for several specific, data-driven research focuses that move from what happened to why it happened. In future work, predictive modeling can be used to analyze how the 2012 hotspots (Section 5) correlate with the specific locations of the 2015 resurgence. Additionally, with increased access to ovitrap data, we can further analyze the ovitrap egg counts (Section 4) and the Dengue Fever case counts (Section 6) at the same neighborhood level. Having larger datasets with more available factors can also help identify if other variables contributed to the timing of the 2015 resurgence. Access to data on population, monthly precipitation, temperature, and drought regions can help identify specific triggers that activate these persistent, high-risk zones.
Overall, this study highlights the value of integrating temporal and spatial analyses to uncover patterns of disease persistence and resurgence in urban environments, while providing a foundation for data-driven public health and safety strategies.